New Hash Function Construction for Textual and Geometric Data Retrieval
نویسندگان
چکیده
Techniques based on hashing are heavily used in many applications, e.g. information retrieval, geometry processing, chemical and medical applications etc. and even in cryptography. Traditionally the hash functions are considered in a form of h(v) = f(v) mod m, where m is considered as a prime number and f(v) is a function over the element v, which is generally of „unlimited“ dimensionality and/or of „unlimited“ range of values. In this paper a new approach for a hash function construction is presented which offers unique properties for textual and geometric data. Textual data have a limited range of values (the alphabet size) and „unlimited“ dimensionality (the string length), while geometric data have „unlimited“ range of values (usually (-∞, ∞) ), but limited dimensionality (usually 2 or 3). Construction of the hash function differs for textual and geometric data and the proposed hash construction has been verified on non-trivial data sets.
منابع مشابه
A Unified Approach for Textual and Geometrical Information Retrieval
Textual and geometrical algorithms have been considered as two separate fields. This was caused by the fact that textual data are discrete in principal and interpolation is not defined as there is no metric in general, while geometrical data are considered discrete samples of continuous phenomena, geometrical surface etc. In this paper we present a unified approach to textual and geometrical da...
متن کاملAn Improved Hash Function Based on the Tillich-Zémor Hash Function
Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.
متن کاملIdentifying and Ranking the Important Textual and Paratextual Elements in Fiction Retrieval
Purpose: The purpose of this study is to identify the textual and paratextual elements in retrieving fiction from the readers’ perspective in order to provide the most appropriate access points for the readers and to improve access to fictions based on the readers’ needs. Method: The current research is an applied study in terms of purpose, applying a mixed method that was conducted using the ...
متن کاملEfficient Hash Function for Duplicate Elimination in Dictionaries
Fast elimination of duplicate data is needed in many areas, especially in the textual data context. A solution to this problem was recently found for geometrical data using a hash function to speed up the process. The usage of the hash function is extremely efficient when incremental elimination is required especially for processing large data sets. In this paper a new construction of the hash ...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کامل